policy space
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (4 more...)
- Research Report (0.69)
- Overview (0.46)
Multi-Agent First Order Constrained Optimization in Policy Space
In the realm of multi-agent reinforcement learning (MARL), achieving high performance is crucial for a successful multi-agent system.Meanwhile, the ability to avoid unsafe actions is becoming an urgent and imperative problem to solve for real-life applications. Whereas, it is still challenging to develop a safety-aware method for multi-agent systems in MARL. In this work, we introduce a novel approach called Multi-Agent First Order Constrained Optimization in Policy Space (MAFOCOPS), which effectively addresses the dual objectives of attaining satisfactory performance and enforcing safety constraints. Using data generated from the current policy, MAFOCOPS first finds the optimal update policy by solving a constrained optimization problem in the nonparameterized policy space. Then, the update policy is projected back into the parametric policy space to achieve a feasible policy. Notably, our method is first-order in nature, ensuring the ease of implementation, and exhibits an approximate upper bound on the worst-case constraint violation. Empirical results show that our approach achieves remarkable performance while satisfying safe constraints on several safe MARL benchmarks.
First Order Constrained Optimization in Policy Space
In reinforcement learning, an agent attempts to learn high-performing behaviors through interacting with the environment, such behaviors are often quantified in the form of a reward function. However some aspects of behavior--such as ones which are deemed unsafe and to be avoided--are best captured through constraints. We propose a novel approach called First Order Constrained Optimization in Policy Space (FOCOPS) which maximizes an agent's overall reward while ensuring the agent satisfies a set of cost constraints. Using data generated from the current policy, FOCOPS first finds the optimal update policy by solving a constrained optimization problem in the nonparameterized policy space. FOCOPS then projects the update policy back into the parametric policy space. Our approach has an approximate upper bound for worst-case constraint violation throughout training and is first-order in nature therefore simple to implement. We provide empirical evidence that our simple approach achieves better performance on a set of constrained robotics locomotive tasks.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (8 more...)
- Education (0.46)
- Leisure & Entertainment > Sports (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Florida > Pinellas County > St. Petersburg (0.04)
- (4 more...)
- Education (0.46)
- Leisure & Entertainment > Sports (0.46)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
Optimizing Instructional Policies
Psychologists are interested in developing instructional policies that boost student learning. An instructional policy specifies the manner and content of instruction. For example, in the domain of concept learning, a policy might specify the nature of exemplars chosen over a training sequence. Traditional psychological studies compare several hand-selected policies, e.g., contrasting a policy that selects only difficult-to-classify exemplars with a policy that gradually progresses over the training sequence from easy exemplars to more difficult (known as {\em fading}). We propose an alternative to the traditional methodology in which we define a parameterized space of policies and search this space to identify the optimum policy.
Theory of Mind Using Active Inference: A Framework for Multi-Agent Cooperation
Pitliya, Riddhi J., Çatal, Ozan, Van de Maele, Toon, Pezzato, Corrado, Verbelen, Tim
Theory of Mind (ToM) -- the ability to understand that others can have differing knowledge and goals -- enables agents to reason about others' beliefs while planning their own actions. We present a novel approach to multi-agent cooperation by implementing ToM within active inference. Unlike previous active inference approaches to multi-agent cooperation, our method neither relies on task-specific shared generative models nor requires explicit communication. In our framework, ToM-equipped agents maintain distinct representations of their own and others' beliefs and goals. ToM agents then use an extended and adapted version of the sophisticated inference tree-based planning algorithm to systematically explore joint policy spaces through recursive reasoning. We evaluate our approach through collision avoidance and foraging simulations. Results suggest that ToM agents cooperate better compared to non-ToM counterparts by being able to avoid collisions and reduce redundant efforts. Crucially, ToM agents accomplish this by inferring others' beliefs solely from observable behaviour and considering them when planning their own actions. Our approach shows potential for generalisable and scalable multi-agent systems while providing computational insights into ToM mechanisms.